13.2 Botryology

163

13.2.2

Principal Component and Linear Discriminant

Analyses

The underlying concept of principal component analysis (PCA) is that the higher

the variance of a feature, the more information that feature carries. PCA, therefore,

linearly transforms a dataset in order to maximize the retained variance while min-

imizing the number of dimensions used to represent the data, which are projected

onto the lower- (most usefully two-)dimensional space.

The optimal approximation (in the sense of minimizing the least-squares error) of

aupper DD-dimensional random vectorbold x element of double struck upper R Superscript upper DxRD by a linear combination ofupper D prime less than upper DD, < D indepen-

dent vectors is achieved by projecting bold xx onto the eigenvectors (called the principal

axes of the data) corresponding to the largest eigenvalues of the covariance (or scatter)

matrix of the data represented by bold xx. The projections are called the principle com-

ponents. Typically, it is found that one, two, or three principal axes account for the

overwhelming proportion of the variance; the sought-for reduction of dimensionality

is then achieved by discarding all of the other principal axes.

The weakness of PCA is that there is no guarantee that any clusters (classes) that

may be present in the original data are better separated under the transformation. This

problem is addressed by linear discriminant analysis (LDA), in which a transforma-

tion of bold xx is sought that maximizes intercluster distances (e.g., the variance between

classes) and minimizes intracluster distances (e.g., the variance within classes).

13.2.3

Wavelets

Most readers will be familiar with the representation of arbitrary functions using

Fourier series, namely an infinite sum of sines and cosines (called Fourier basis

functions). 7 This work engendered frequency analysis. A Fourier expansion trans-

forms a function from the time domain into the frequency domain. It is especially

appropriate for a periodic function (i.e., one that is localized in frequency), but is

cumbersome for functions that tend to be localized in time. Wavelets, as the name

suggests, integrate to zero and are well localized. They enable complex functions

to be analysed according to scale; as Graps (1995) points out, they enable one to

see “both the forest and the trees”. They are particularly well suited for representing

functions with sharp discontinuities, and they embody what might be called scale

analysis.

The starting point is to adopt a wavelength prototype function (the analysing or

mother wavelet)normal upper Phi left parenthesis x right parenthesisΦ(x). Temporal analysis uses a contracted, high-frequency version

7 Fourier’s assertion was that any2 pi2π-periodic functionf left parenthesis x right parenthesis equals a 0 plus sigma summation Underscript k equals 1 Overscript normal infinity Endscripts left parenthesis a Subscript k Baseline cosine k x plus b Subscript k Baseline sine k x right parenthesis f (x) = a0 + Σ

k=1(ak cos kx + bk sin kx).

The coefficients are defined asa 0 equals left parenthesis 2 pi right parenthesis Superscript negative 1 Baseline integral Subscript 0 Superscript 2 pi Baseline f left parenthesis x right parenthesis d xa0 = (2π)1 { 2π

0

f (x) dx,a Subscript k Baseline equals pi Superscript negative 1 Baseline integral Subscript 0 Superscript 2 pi Baseline f left parenthesis x right parenthesis cosine left parenthesis k x right parenthesis d xak = π1 { 2π

0

f (x) cos(kx) dx, andb Subscript k Baseline equals pi Superscript negative 1 Baseline integral Subscript 0 Superscript 2 pi Baseline f left parenthesis x right parenthesis sine left parenthesis k x right parenthesis d xbk =

π1 { 2π

0

f (x) sin(kx) dx.